Skip to content

refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229

Open
gibber9809 wants to merge 6 commits into
y-scope:mainfrom
gibber9809:refactor-output-handler-options
Open

refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229
gibber9809 wants to merge 6 commits into
y-scope:mainfrom
gibber9809:refactor-output-handler-options

Conversation

@gibber9809
Copy link
Copy Markdown
Contributor

@gibber9809 gibber9809 commented Apr 28, 2026

Description

This PR reworks how we parse arguments for output destinations during search in order to make it easy to support having multiple output destinations in a follow-up PR.

Each output destination is treated like a subcommand, e.g. <output-destination-type> <output-destination-arguments>. Previously, when supporting only one output destination, we would simply collect the unrecognized options after parsing the main search options and hand those options to an options_description specific to the relevant output destination. With multiple output destinations though we need to do a little bit of extra work to figure out which unrecognized options are related to each subcommand.

This PR solves that problem by parsing the unrecognized options left to right, using both the known output destination names and the options_description object for each subcommand to attempt to find the options for each output destination. After this step the options can be parsed with each destination-specific options_description like normal. We can't simply iteratively use each options_description with allow_unregistered to parse the full list of options, because different output destinations share options with the same name (e.g. --host is used both by reducer and network), so some combinations of output destinations could result in throwing errors because of repeated options. As well, we can't simply rely on splitting up the options for each subcommand based solely on the ouptut destination names, because the output destination names can also appear as arguments (e.g. in the package, because of the way our DNS is set up, a reducer query will look like clp-s s <archive> <query> reducer --host reducer --port <port> --job-id <id> --count.

This PR also refactors the CommandLineArguments class to group together options for each output destination into structs.

This PR includes a small functional change, in that aggregation options are now part of the reducer output destination options. This changes how aggregation options must be specified on the command line (i.e. they explicitly need to be part of the reducer output destination arguments), but doesn't represent any change in functionality since aggregation options are only allowed for the reducer output destination anyway.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Confirmed that package search through webui and cli work as expected
  • Manually validated that options are split as expected when passing multiple output destinations (before the command is rejected for having multiple output destinations)

Summary by CodeRabbit

  • Bug Fixes

    • Stricter validation and error handling for output handler arguments, including treating stdout as no-options handler.
    • Search now rejects unsupported reducer usage in certain output scenarios.
  • Refactor

    • Command-line argument handling restructured to provide per-output-handler option groups and typed option objects.
    • Aggregation flags moved to reducer-specific handler scope and repositioned in constructed reducer commands.

@gibber9809 gibber9809 requested a review from a team as a code owner April 28, 2026 21:21
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Walkthrough

Output handler configuration moved from many individual getters to typed option structs; CLI parsing now groups trailing tokens into per-handler subcommands and requires exactly one handler. Aggregation flags were relocated into reducer-specific options and validation; runtime code and command assembly updated to use the new option model.

Changes

Output handler & CLI parsing

Layer / File(s) Summary
Option Types / API
components/core/src/clp_s/CommandLineArguments.hpp
Adds AggregationType enum and four output-handler option structs; replaces many scalar getters with get_*_output_handler_options() getters; updates helper signatures to take std::vector<std::string> const& and adjusts includes.
CLI Parsing / Subcommand Splitting
components/core/src/clp_s/CommandLineArguments.cpp
Parses unrecognized trailing tokens into per-output-handler “subcommand” argument groups; requires exactly one handler; implements parse_*_output_handler_options per handler; moves --count/--count-by-time into reducer option parsing and validates reducer-specific constraints; treats stdout as handler with no options.
Application Wiring
components/core/src/clp_s/clp-s.cpp
Creates output handlers and configures reducer aggregation by consuming the new option structs; switches aggregation branching to options.aggregation_type; localizes option variables in case blocks.
Search Aggregation Check
components/core/src/clp_s/kv_ir_search.cpp
Changes condition to return CountSupportNotImplemented when selected output handler is Reducer rather than based on former aggregation-flag methods.
Command Assembly for Jobs
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
Appends aggregation flags (--count, --count-by-time and bucket size) after reducer argument block when building the reducer command.
Tests/Docs/Support
components/core/src/clp_s/*
Parsing helpers, validation logic, and related doc comments updated alongside the new structured option model.

Build Configuration

Layer / File(s) Summary
CMake sources
components/core/src/clp_s/CMakeLists.txt
Removes ../clp/cli_utils.cpp and ../clp/cli_utils.hpp from CLP_S_CLP_SOURCES list (no other CMake changes).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (argv)
    participant Parser as CommandLineArguments
    participant App as clp-s
    participant Search as kv_ir_search
    participant Job as fs_search_task.py
    CLI->>Parser: provide argv (including trailing handler tokens)
    Parser->>Parser: split trailing tokens into handler subcommand args
    Parser-->>App: return selected OutputHandlerOptions (exactly one)
    App->>Search: initiate search with selected handler type
    Search-->>App: responds with aggregation support check (uses handler type)
    App->>Job: when dispatching reducer jobs, pass reducer args then append aggregation flags
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: refactoring command-line argument parsing for output destinations to improve extensibility, which is the core objective throughout all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.12)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py

Unexpected Ruff issue shape at index 6

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/CommandLineArguments.cpp`:
- Around line 184-211: The loop currently defaults
get_max_tokens_for_option(...).value_or(0) to 0 which lets an option with
unknown arity cause the next token (which might match a subcommand name) to be
misinterpreted as a subcommand; change the logic in the branch that sets
remaining_tokens_in_current_argument so that when
get_max_tokens_for_option(current_subcommand_options_description, option)
returns std::nullopt you handle it explicitly (e.g., throw
std::invalid_argument("unrecognized option \"...\"") or otherwise treat it as an
error) instead of silently defaulting to 0; update code around
remaining_tokens_in_current_argument, current_subcommand,
current_subcommand_arguments and collect_subcommand to rely on that explicit
handling.
- Around line 993-995: The thrown std::invalid_argument in
CommandLineArguments.cpp currently says "clp-s only supports one output handler
at a time"; change this to include the actual detected handlers from
output_options_map so users see which handlers were passed. Locate the check
using output_options_map (the one that throws when output_options_map.size() >
1) and build a descriptive message by joining the map keys / handler names
(e.g., iterate output_options_map and collect the keys into a comma-separated
string) and include that string in the exception text so the thrown message
shows the list of detected handlers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c84e61f7-19bf-4f9e-90b9-3b0851f0bfab

📥 Commits

Reviewing files that changed from the base of the PR and between aa28a0c and 359ab95.

📒 Files selected for processing (6)
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp_s/CommandLineArguments.cpp
  • components/core/src/clp_s/CommandLineArguments.hpp
  • components/core/src/clp_s/clp-s.cpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
💤 Files with no reviewable changes (1)
  • components/core/src/clp_s/CMakeLists.txt

Comment thread components/core/src/clp_s/CommandLineArguments.cpp
Comment on lines +993 to 995
if (output_options_map.size() > 1) {
throw std::invalid_argument("clp-s only supports one output handler at a time");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider clarifying the single-handler constraint.

The error message could be more helpful by listing which handlers were detected. This would aid debugging when users accidentally specify multiple handlers.

💡 Optional enhancement for better error diagnostics
             if (output_options_map.size() > 1) {
-                throw std::invalid_argument("clp-s only supports one output handler at a time");
+                std::string handlers;
+                for (auto const& [name, _] : output_options_map) {
+                    if (false == handlers.empty()) {
+                        handlers += ", ";
+                    }
+                    handlers += name;
+                }
+                throw std::invalid_argument(fmt::format(
+                        "clp-s only supports one output handler at a time, but found: {}",
+                        handlers
+                ));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/CommandLineArguments.cpp` around lines 993 - 995,
The thrown std::invalid_argument in CommandLineArguments.cpp currently says
"clp-s only supports one output handler at a time"; change this to include the
actual detected handlers from output_options_map so users see which handlers
were passed. Locate the check using output_options_map (the one that throws when
output_options_map.size() > 1) and build a descriptive message by joining the
map keys / handler names (e.g., iterate output_options_map and collect the keys
into a comma-separated string) and include that string in the exception text so
the thrown message shows the list of detected handlers.

@gibber9809 gibber9809 requested a review from LinZhihao-723 May 3, 2026 21:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`:
- Around line 154-155: The code currently appends "--count" when
aggregation_config.do_count_aggregation is not None, which incorrectly includes
False; change the guard to only append the flag for a truthy value (e.g., use if
aggregation_config.do_count_aggregation: or if
aggregation_config.do_count_aggregation is True:) so that --count is only added
when do_count_aggregation is true; update the check around the
command.append("--count") in the code that constructs the command (refer to
aggregation_config.do_count_aggregation and the command list where
command.append("--count") is called).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d58fee1a-8c86-4133-bfe9-aded68e6283d

📥 Commits

Reviewing files that changed from the base of the PR and between 359ab95 and feb23c7.

📒 Files selected for processing (2)
  • components/core/src/clp_s/CMakeLists.txt
  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
💤 Files with no reviewable changes (1)
  • components/core/src/clp_s/CMakeLists.txt

Comment on lines +154 to +155
if aggregation_config.do_count_aggregation is not None:
command.append("--count")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find aggregation config model definition =="
rg -n -C3 'do_count_aggregation|count_by_time_bucket_size|class .*Aggregation' --type=py

echo
echo "== Find assignments/usages that may pass False =="
rg -n -C2 'do_count_aggregation\s*=' --type=py

echo
echo "== Find command construction sites for --count =="
rg -n -C3 '"--count"|\'\''--count'\''' --type=py

Repository: y-scope/clp

Length of output: 11193


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check for explicit False assignments to do_count_aggregation =="
rg -n 'do_count_aggregation\s*[:=]\s*False' --type=py

echo
echo "== Check instantiation patterns for AggregationConfig =="
rg -n -A2 'AggregationConfig(' --type=py | head -100

echo
echo "== Check the full context of where --count flag is constructed =="
rg -n -B5 -A1 'append.*--count' --type=py

Repository: y-scope/clp

Length of output: 122


The --count flag guard should check truthiness, not non-None.

Line 154 currently adds --count whenever do_count_aggregation is not None, which includes when it is explicitly False. Since the field is typed as bool | None, False is a valid distinct state, and the flag should only be appended for truthy values.

Proposed fix
-        if aggregation_config.do_count_aggregation is not None:
+        if aggregation_config.do_count_aggregation:
             command.append("--count")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if aggregation_config.do_count_aggregation is not None:
command.append("--count")
if aggregation_config.do_count_aggregation:
command.append("--count")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`
around lines 154 - 155, The code currently appends "--count" when
aggregation_config.do_count_aggregation is not None, which incorrectly includes
False; change the guard to only append the flag for a truthy value (e.g., use if
aggregation_config.do_count_aggregation: or if
aggregation_config.do_count_aggregation is True:) so that --count is only added
when do_count_aggregation is true; update the check around the
command.append("--count") in the code that constructs the command (refer to
aggregation_config.do_count_aggregation and the command list where
command.append("--count") is called).

Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have comments on the parsing logic. I think how we store the parsed options is the most important decision to make in this PR, since it affects how C-APIs will be written.
One way is to do the variant thing I mentioned in another comment.
The other way is we expose different entry points for search, each takes a particular config option type and initializes the output handler accordingly.
I think for now, it makes more sense to use the variant solution as it keeps the C++ version cleaner: one config variant variable instead of one variant enum + four config objects stored inside the master config.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the API design perspective (to make it future-proof), I think it might make sense if we:

  • Make the output handler config a variant of the four different types.
  • The master config only contains one variant variable, and it must be one of the four.
  • Replace the switch statement with a variant visitor.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review the coding-style thing... the functionality looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants